Whistler: a trainable text-to-speech system
نویسندگان
چکیده
We introduce Whistler, a trainable Text-to-Speech (TTS) system, that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style.
منابع مشابه
Recent improvements on Microsoft's trainable text-to-speech system-Whistler
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of...
متن کاملRecent Improvements on Michael’s Trainable Sample Paper System - Whistle
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of...
متن کاملAutomatic generation of synthesis units for trainable text-to-speech systems
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will describe in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The automatic process includes (1) determining the scaleable synthesis unit which can reflect spectral variations of different allophones;...
متن کاملReducing the footprint of the IBM trainable speech synthesis system
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech gen...
متن کاملPhrase splicing and variable substitution using the IBM trainable speech synthesis system
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speechproduction lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone se...
متن کامل